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This is a summary of the presentation given at the Conference on Mass Storage 
Systems and Technologies for Space and Earth Science Applications. The presen- 
tation was compiled at the National Center for Atmospheric Research (NCAR), 
Boulder, Colorado. NCAR is operated by the University Corporation for 

Atmospheric Research and is sponsored by the National Science Foundation. Any 
opinions, findings, conclusions, or recommendations expressed in this paper are 
those of the author and do not necessarily reflect the views of the National Science 
Foundation. 
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This presentation is designed to 
relate some of the experiences of the 
Scientific Computing Division at NCAR 
dealing with the "data problem." A brief 
history and a development of some basic 
Mass Storage System (MSS) principles are 
given. An attempt is made to show how 
these principles apply to the integration 
of various components into NCAR’s MSS. 
There is discussion of future MSS needs 
for future computing environments. 

NCAR provides supercomputing and 
data processing for atmospheric, oceanic 
and related sciences. This service is 
provided for university scientists and for 
scientists located at NCAR. There is a total 
of about 1200 users. 

The data problem for this 
community can briefly be summarized as 
follows; Historical atmospheric data is 
archived, programs are saved and the data 
which model the atmosphere, oceans and 
sun are saved. The NCAR storage 
experience is based upon current 
supercomputing megaflop rates which 
produce a number of terabytes archived 
on a yearly basis. There is a history of 
data growth and file growth. The NCAR 
data storage experience has been as 
follows; There are about 500 bytes of 
information archived for each megaflop 
of computing. When NCAR had an X- 
MP/48, the archive rate for the utilized 
megaflop compute rate was 3 terabytes 
per year. The installation of a Y-MP8/864 
increased the archival rate to 6 terabytes 
per year. Forecasting future computing 
configurations and atmospheric models 
being planned we are now approximating 
a 30-50 terabyte archive per year rate by 
the year 1993 or 1994. 

Data has been saved in many forms 
over NCAR's existence and then migrated 
to machine-readable media. Some of the 
data has come from handwritten logs, 
from punch cards, half-inch tape. All of 
this has been collected and is now 
archived on IBM 3480 cartridge tape. One 
of the basic principles for archiving this 
data is to identify certain classes of data. 
Archive data is kept forever. Long-term 


data is kept for 10 to 15 years. Near-term 
data is kept for 1 month to 1 year and a 
category called scratch data is killed after 
1 month and cannot be recovered 
automatically by the system. 

One of the other basic principles 
that has been identified is that dataset 
sizes continue to grow as a function of 
supercomputing sizing. The amount of 
data that can be saved is bound in storage 
by media capacities. That is, these criteria 
are established for determining which 
data will be saved and for how long 
because there is not an infinite media 
capacity at this time. Our experience has 
shown that every 10 to 15 years the data 
in the MSS will need to be migrated to a 
new media base because of changing 
systems and obsolescence of existing 
media. Usually the media or the drives 
cannot be purchased anymore. This 
migration takes place not because the data 
is bad on the media, but because the drives 
will not be available. 

Another problem is that a number 
of companies have provided the capability 
for this massive storage, but the small 
companies tend to disappear within five 
years. The drive components that have 
been furnished for mass data storage 
disappear in five to eight years no matter 
what company they come from. 

The next basic principle is that the 
migration of the mass storage system data 
to a new media base, which is now several 
ten's of terabytes, is not a trivial 
operation. The migration does not take 
place in a short amount of time. For 
instance, one-time migrations can run for 
long periods of time, necessarily years to 
move terabytes data. It is very difficult to 
guarantee that the data is migrated 
absolutely without reading it back, which 
is time consuming. These migrations are 
very costly and in my opinion shouldn't 
be done. We have developed the concept 
of "DATA OOZE," and we prefer this 
technique over migration right now. The 
way DATA OOZE works is that it is a 
continuous movement of data within the 
system. The data is moving across the 
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storage hierarchy and across the 
changing media types under the control 
of the MSS. The migration path for this 
data in the hierarchy can be from 
memory to solid state disk to high speed 
disk to disk arrays or farms, and from 
there out to some kind of tape. Later on as 
new data storage media become available, 
the data is migrated onto these media in 
real time, since every day some amount of 
the data is migrated as it is being used. 

Our concl usion s from these 
experiences have been that new 
components and media types are 
integrated according to the following 
rules; Use standard components. The 
standards may be real or de facto and 
apply in the areas of channels, interfaces, 
operating systems, media, etc. We look for 
media that is easy to obtain and is cost 
effective. We look for the long-term 
viability of the vendor and multiple 
sources for the many system components. 
In the area of mass storage system 
integration we look at access speeds, ease 
of expandability, heterogeneous host 
access, maintenance costs, media costs and 
systems costs. 

There are a number of future 
growth issues for the NCAR MSS. The 
Scientific Computing Division (SCD) 
continues to develop future configuration 
scenarios. These scenarios try to 
anticipate the functional requirements 
we anticipate providing for our scientific 
community. There are three key 
components we need to address: network 

services and access, the large scale 
computing (Big Iron), and the data 
archives. Of course, these all play within 
the context of distributed computing. 

The near-term issues for the NCAR 
MSS focus on some immediate upgrades 
which will deal with the MSS growth for a 
couple of years. The entire archive will 
be migrated onto double density 3490 and 
3490-compatible media. The mid-90s to 
late 90s became more interesting because 
of the expanding interest in archiving 
vast data collections. 


The issues of future growth will be 
centered in three areas of ongoing 
development: the various MSS software 

packages, the data storage components 
and the networks. 

The questions then become how all 
of these components get assembled and 
which ones do we plan to use. Will SCD be 
able to construct on effective peta-byte 
MSS by the end of the decade? Which of 
our basic principles can we apply to 
insure that such a system can be built? 

# # # 
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The NCAR Storage Experience 



• 500 Bytes per million flop 

• Archival rate for model output 

- 4 TBytes/year with X-MP/48 

* 8 TBytes/year with Y-MP8/864 

- 40 TBytes for climate simulation 



^ | NCAR febnttflc Computing OMaipn 
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History 

(Continued) 


Data saved in many forms - then migrated to machine readable media: 


Handwritten logs 

> 

Punched cards 

Punched cards 

> 

One-half Inch tape 

One-half Inch tape 

> 

AMPEX TBM tape 

AMPEX TBM tape 

> 

IBM 3480 tape 

IBM 3480 tape 

> 

IBM 3490-E tape 

IBM 3490-E tape 

> 

? ? ? 


St^aroampulnf * Comw nrtcd tan* • Date 


Basic Principles 

Identification of data classes: 

• Archive data = keep forever 

• Long-term data = keep 10-15 years 

• Near-term data = keep 1 month to 1 year 

• Scratch data = kill after 1 month 


I NCAR Scientific Cgmputk>fl 01; 
Supar compute * Gommuitealera * Data 
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Basic Principles 
(Continued) 


• Dataset sizes continue to grow as a function of supercomputer sizing 

• Dataset sizes are constrained in Storage by media capacities 

• Every ten to fifteen years, the data in the MSS will need to be migrated 
to a new media base 


NfrAR Scientific Cornputiny Division 
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Basic Principles 
(Continued) 


Migration: 

• Not because the data is bad on the media.- 

• But because the drives will not be there 

a 

• Half life of a start-up company = 5 years 
- Half life of drive electronics * 5-8 years 


Vised 


NCAR Scfrnllfk: Computing Dlvfrlon. 
SuparoompuHng * Coenemnlceiona • Oali 
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Basic Principles 
(Continued) 


The migration of MSS contents ("n" tera-bytes) to new media 
is not a trivial operation 

One-time migrations: 

• Run for long periods of time (years) 

• Are difficult to guarantee 

• Are costly 

• Shouldn't be done 




NCAR Sci entific Cony irtiny Division 
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Basic Principles 
(Continued) 


Data OOZE preferred over migration 

Data OOZE is a continuous movement of data within the system: 
• Data movement across: 

- The storage hierarchy 

- The changing media types 
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MSS Integration 


• Access speeds (sometimes) 

• Ease of expandability 

• Multiple heterogeneous host access 

• Maintenance costs 

• Media costs 

• System cost 



FUTURE GROWTH 
ISSUES 
IN THE 

NCAR MASS STORAGE SYSTEM 
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Functional Diagram of the NC AR Computing Complex 
" . UCAR.EDU" 


"Big Iron” 
r- Y-MP8 


Fastpath |-Y-MP2 
for 


Data 


f—MSS 


UTAGS 


Special 

Services 


Network Services 
and Access 




Gateways 

-IRJE 

-MIGS 


Foothills 

Lab 


Servers — I— CWIde Area Net^> Universities 

. Email 

. Math Libs 
. Documentation 




NCAft Scjgntifjc Corputtog PtvWQn. 

Jkpwocnipulhg * CorrwriixUc«*ori* • Oak 


270 


ki .niimiuu iiiun.ii i mi i , iiiiiiiUiii, I, n li L 1 1 ilmliiii. I lilli II 1,111 illWllliUtl, 



r. 


FY93-95 Functional Diagram 
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NCAR MSS 


Near Term Upgrades 


L Purchase (IBM) 3490E drives for double density capability 

2. Automatic double density migration takes place for shelf archive 

3. HopeisSTK furnishes double density for drives on ACS in 
< 6 months. 
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(From: Pa trie Sang*) 
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